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Abstract 

Markov chains have long been used for generating random variates from spatial 
point processes. Broadly speaking, these chains fall into two categories: Metropolis- 
Hastings type chains running in discrete time and spatial birth death chains running in 

K- 5 continuous time. These birth death chains only allow for removal of a point or addition 

of a point. In this work it is shown that the addition of transitions where a point is 
moved from one location to the other can aid in shortening the mixing time of the 
chain. Here the mixing time of the chain is analyzed through coupling, and use of the 
swap moves allows for analysis of a broader class of chains. Furthermore, these swap 

Oh moves can be employed in perfect sampling algorithms via the dominated Coupling 

from the Past procedure of Kendall and M0ller. This method can be applied to any 
pairwise interaction model with repulsion. In particular, an application to the Strauss 
process is developed in detail, and the swap chains are shown to be much faster than 

1-1 standard birth death chains. 

Keywords: Perfect simulation, coupling from the past, swap moves, birth death 
process, spatial point processes, Strauss process 
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1 Introduction 

J> Spatial point processes are in wide use in statistical modeling (see [16] for an overview). 

It is typical to model finite point processes as being absolutely continuous with respect 
to a Poisson point process. That is, they have a density f{x)/c where f(x) is an easily 
computable function but the normalizing constant c of the density is impractical to 
compute exactly. A Monte Carlo algorithm gains information about the density by 
studying random variates drawn from the distribution the density describes. 

In order to obtain such variates, a Markov chain is constructed whose station- 
ary distribution matches the target distribution. Metropolis-Hastings chains for this 
problem run in discrete time (see pH]), and the spatial birth death chain approach of 
Preston [23j runs in continuous time. In [6] it was noted that for some problems the 
Metropolis-Hastings approach appears faster than the continuous time methods. 

There is a drawback to all of these Markov chain Monte Carlo methods in that 
unless the mixing time of the Markov chain is known ahead of time, the quality of the 
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variates obtained is suspect. While heuristics such as the autocorrelation test (see [9]) 
can provide proof that a chain has not mixed, it is often difficult to establish the 
positive claim that a chain has mixed. 

Perfect simulation algorithms are a solution to this problem, they generate samples 
exactly from the desired distribution without the need to know the mixing time of a 
Markov chain. Kendall [19] showed how the Coupling from the Past (CFTP) idea of 
Propp and Wilson |24| could be used together with a spatial birth and death chain to 
obtain samples from area-interaction processes. Kendall and M0ller [20J showed how 
this method could be extended to any locally stable point process using a method they 
called dominated CFTP. They also considered perfect sampling Metropolis-Hastings 
chains, but restricted these chains to only adding or deleting a point at each step. 

So while [6] indicates that Metropolis-Hastings chains can beat continuous time 
spatial birth death chains, [20] shows how CFTP can be used with the latter chains. 
In this work a new swap move is introduced that speeds up convergence, while still 
allowing dominated CFTP to be used for perfect simulation. 

In Section 2.1 the theory behind spatial birth-death chains with the new swap move 
is developed. Section [3] describes the framework for how these chains are employed for 
locally stable processes, and uses these moves for several examples. Section [4] then 
reviews the use of dominated coupling from the past for these types of chains, and 
shows how the addition of swap moves fits into this protocol. In Section[5j the examples 
of Section [3] are simulated, both with and without the swap moves. Section [6] bounds 
the expected running time of the procedure for a restricted class of models. 



2 Spatial point processes 

Dyer and Greenhill [7] first introduced a swap move for hard core processes living in 
discrete spaces. Here their method is extended to more general point processes where 
the points can lie in continuous spaces and repulsion can be weaker than hard core. 

Following [3], the 'Carter-Prenter exponential space' will be used as the state space. 
Consider a separable measure space (S,B, A) where < X(S) < oo and B contains all 
singletons. For example, both finite spatial point processes and marked point processes 
can be included in this framework. 

For simple spatial data S is usually a bounded Borel set of R 2 and A is typically 
Lebesgue measure. Other information can then be added to the points as marks. For 
example, Harkness and Isham [13] studied locations of ants nests in a rectangular region 
R. Since there were two types of ants, S = R x {0, 1} and A is the cross product of 
Lebesgue measure and a measure on {0, 1}. 

As usual for continuous applications, suppose (Vs E 5)(A({s}) = 0). Now consider 
generating N as a Poisson random variable with parameter \(S). Then with ./V in hand, 
draw Xi, . . . , Xn iid from X(-)/X(S). Then {Xi, . . . , X^} (called a configuration) is a 
draw from a Poisson point process with intensity measure A(-) over S. Let [i be the 
distribution of {X%, . . . ,Xn}, and Q the set of all possible configurations. See [23] for 
more details and precise definitions of ji and Q. 

Now consider point processes that are absolutely continuous with respect to \i with 
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density / satisfying a local stability condition (as in [20]): 

(3K > 0)(Vx G n)(V« G S)(/(z + u) < tf/(x)), (1) 

where x+t> denotes the configuration xL){v}. Many point processes of interest meet this 
condition, including the area-interaction process [21|27], the Strauss process [HIES], 
and the continuous-random cluster model [2H [12] . 

2.1 Spatial birth death swap chains 

The presentation of the swap move given here follows the framework of Preston |23j, 
who introduced the use of continuous time spatial birth death chains for these problems. 
These chains are examples of jump processes, where at a given state x, the chain stays 
in the state for an exponential length of time with expected value given by l/a(x). 
The state then jumps to a new state using kernel K, so the probability the new state 
is in A is K.(x,A)) independent of the past history (see [8], Chapter X for the details 
of jump processes.) 

In the Preston framework, the rate of births (adding points to the configuration) 
and deaths (deleting points from the configuration) depends only on the current state: 

• There exists a nonnegative measurable birth rate function b from 0x5 equipped 
with the standard product cr-field to R with the Borel cr-field. 

• There exists a nonnegative measurable death rate function d from VtxS equipped 
with the standard product cr-field to R with the Borel cr-field. Furthermore, 
w G x d(x, w) > and w x d(x, to) = 0. 

To this birth death framework we now add a swap rate: 

• There exists a nonnegative measurable swap rate function s from 0, x S x S 
equipped with the standard product cr-field to R with the Borel cr-field. Further- 
more, w £ x => s(x, to, v) = 0. 

Here b(x,v) is the rate at which points v are added to x, d(x,w) is the rate at which 
point w is removed from x, and s(x, w, v) is the rate at which point to is removed and 
point v is added to x. In order to make the construction precise, from b, d and s the 
rate of change a(x) and kernel K for the Markov chain must be defined. 

For all A G B, let K b (x, A) = j veS b(x, v)l(x + v G A) A(d-o). When K b (x, O) < oo 
for all x in 0, a birth kernel can be defined: 

K b (x,A) = K b (x,A)/K b (x,n), (2) 

with rate r b (x) = f v£S b(x,v) X(dv). 

Similarly, Kd(x, A) = ^ u6J d(x, to)l(x — w G A) which always has a finite number 
of terms and so 

K d (x,A)=K d (x,A)/K d (x,n), (3) 

with rate r d (x) = ^2 v&x d(x,v). 

Lastly, set K s (x, A) = Ylwex I v eS s ( x > u? ' v )^-( x ~ w+v G A) A(dt>). When K s (x, O) < 
oo for all x G 0, let 

K s (x,A) = K s (x,A)/K s {x,n), (4) 
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and r s (x) = J2 w ex Ives s ( x,w,v ^ -M ^ 1 )- This makes the overall rate 

a{x) = r b (x) + r d (x) + r s (x), (5) 

and the overall kernel: 

K(x, A) = K b (x, Af-^+ K d (x, A) r -^l + +K s (x, A)^. (6) 
a{x) a(x) a(x) 

Harris recurrence guarantees that a Markov process has a unique invariant measure 
(see pQ for details of Harris recurrence in continuous time). Kaspi and Mandelbaum |17| 
showed that a continuous time chain is Harris recurrent if and only if there exists a 
nonzero cr-finite measure where the chain almost surely hits sets with positive measure. 

In particular, for all the chains here, the death rate equals the number of points in 
the configuration, and the birth rate is bounded above by a constant. This forces the 
chain to visit the empty configuration infinitely often, making it Harris recurrent. 

It is well-known (see Proposition 8.1 of |23J) that for jump processes with a unique 
invariant distribution, the density / is invariant if and only if for all A G T: 



a(z)f(z) Mz) = / K(y,A)a(y)f(y) dfx(y), (7) 
a Jn 

for a as in ([5]). Detailed balance conditions for time-reversibility of jump processes are: 

f(x)a{x)K(x,dy) dfi(x) = f(y)a{y)K(y,dx) d/x(y). (8) 

For moves from configurations with n points to those with n + 1 (or vice versa), the 
detailed balance conditions are satisfied |23l [25] when the rate of births balance the 
rate of deaths with respect to /. So 

f(x)b(x, v) = f{x + v)d(x + v, v). (9) 

Swap moves stay inside the same dimensional space, and it is straightforward to 
show that reversibility for swap moves holds when 

f(x)s(x, w, v) = f(x + v — w)s(x + v — w, v, w). (10) 



3 Applications 

The description of the continuous time birth death swap chain given here mirrors 
that of Kendall and M0ller [20] for Preston's continuous time birth death chains. In 
Section [4] their method of dominated coupling from the past (dCFTP) will be used 
with these chains to obtain a perfect simulation algorithm. Consider a locally stable 
point process satisfying ([l]) with density /. 
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Dominating process To put the framework of Preston into practice, let the dominat- 
ing process (D, M) be a marked spatial point process over times t £ (—00, 00), where 
D is a spatial birth death process with constant birth rate K and constant death rate 1 
over S. This makes the total birth rate equal to KX(S), and the total death rate equal 
to the number of points in the configuration. The process M contains an extra mark 
for each point in the process that is a uniform draw on [0, 1] that is independent of 
other marks, the point location, and the location of all previous points in the process. 
The following pseudocode generates the Poisson point process D(0). 

Start dominating process 

Input: K, A(-) 

Output: D(0) 

1) Draw N <- Pois(KX(S)) 

2) Let D(0) <- 

3) For i from 1 to Nq do 

4) Draw^A(-)/A(5) 

5) Let D(0) <- D(0) + v 

With the starting state, it is now possible to generate moves: births or deaths of 
points in the dominating chain. Exp(l/r) denotes an exponential random variable with 
mean r, and Exp(0) = 00 with probability 1. Unif(x) denotes a uniform draw from the 
points in the configuration x. 
Dominating process 

Input: x (starting configuration), n (number of events) 

Output: x (configuration after n events), A (list of births and deaths) 

1) Let t <- 

2) For i from 1 to n do 

3) Draw t b <- Exp(KX(S)), draw t d <- Exp(#x), draw M <- Unif([0, 1]) 

4) If tb < td (then there is a birth) 

5) Draw v <- X(-)/X(S), let x <- x + v 

6) Add row [t "birth" v M] to A 

7) Else (there is a death) 

8) Draw w <— Unif(x), let x ^— x — w 

9) Add row [t "death" w] to A 
10) Let t <- t + minjtf,, t^} 
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Figure 1: Sample run of Dominating process 
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With the dominating process in hand, it is possible to build any locally stable 
process X with stationary density / that satisfies X(t) C D(t) for all t as follows. 
Suppose X(T) C D(T) for a particular time T. Now consider a birth or death at time 
t > T. When a point v dies from D at time t, it also dies in X, and the X(t) C Z?(i) 
property is maintained. 

Let r(x, u) = K~ l f(x + v)/ f(x). If a point w is born at time £ into the dominating 
process, then the point is also born into X when the mark on v satisfies M{t) < r(x, v). 
By only accepting the birth in D as a birth in X with probability r(x, v), the birth rate 
for X is reduced from b(x,v) to b(x,v)r(x,v) = f{x + v)/ f{x). This last ratio is known 
as the Papangelou conditional intensity, and the method of only selecting certain births 
is known as thinning the process (see Appendix G of |22J for a careful construction of 
the thinning procedure.) 

Here f(x + v)/f(x) is the Papangelou conditional intensity, and the thinning prop- 
erty of Poisson processes means that this procedure yields a process X with birth rate 
b(x, v) = (K)(K~ 1 )f(x + v )/ f(x) = f(x + v)/ f(x). Together with the death rate of 1, 
this gives reversibility as in ([9]). 

Now to add the swap move. For repulsive processes, often f(x + v)/ f(x) is small, 
but there will exist w E x such that f(x — w + v)/ f(x). This is where it is natural to 
have a swap take place. Suppose there exist probabilities p(x, w, v) such that 

K-iK^ + J2p(x,™,v)<1, (11) 

and 

f(x)p(x, w, v) = f(x + v — w)p(x + v — w, v, w). (12) 

Number points in x as {w\,W2, ■ ■ ■ , w# x ). Set £k = [K" 1 f {x+v) / f (x)]+J2i=\ p(x, Wi, v), 
and u k = [K- 1 f(x + v)/f(x)] + J2^=iP{x,w l ,v). If M(v) < K~ 1 f(x + v)/ f{x), then v 
is added to the configuration, and if M(v) £ [i^, u^}, then v is added and is removed 
from the configuration. 

The probability of swapping from x to x + w — v is just p(x,w,v), while that of 
swapping from x + w — v to x is p{x + v — w,v,w), and detailed balance for swap moves 
is maintained. This procedure is now illustrated for two models. 

Strauss model In the Strauss model ([18j,[26j), the density has a factor which is 
exponential in the number of pairs of points that lie within distance R of each other 
Let p be a metric on S, then the density can be written: 

f s (x) = Z-\ M ^ X ^ X \ s(x)= Y, Hp(v,v')<R), (13) 

{v,v'}:v£x,v'£x\{v} 

where ■^(/3 1 ,/3 2 ,i : ?) i s the normalizing constant for the density. As noted in [18J, in order 
for Zm u fc,R) to be finite (and hence for the density to exist) fa must be at most 1. 

Typically S is a bounded subset of R d for d 6 {1, 2, 3}, and A is Lebesgue measure. 
Figure [2] illustrates a random draw from this density for A equal to Lebesgue measure 
and S = [0, l] 2 . This sample was generated using the techniques of the next section. 
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Figure 2: Strauss model with S = [0, l] 2 , (3 X = 100, (3 2 = 0.50, R = 0.1 



Since P2 < 1, X = /3i and births at f in the dominating process are accepted 
with probability fi^'^i where n(v, x) = Y^wex 1(p( v ' w ) — R) * s the number of points 
currently in the configuration within distance R of the proposed point v. 

Let Bern(p) denote a Bernoulli distribution with parameter p. One way to generate 



B ~ Bern(/?2 ^ v ' x ^) is to generate B w ~ Bern(/32) for each w within distance R of v and 
let B = Y\ B w . Only when every one of the neighbors draws a 1 is the birth accepted. 

Now suppose exactly one of the B w = while the rest are 1. This is the situation 
where a swap move can be used: point v is added to the state while point w is removed. 



iid 



For the Strauss process this makes p(x,w,v) = l(p(v,w) < R)(l — fa)^ 



n(v,x)—l 



SO 



equations (11) and (12) are satisfied. Therefore this is a legal swap move. 



The following pseudocode shows how a birth or death in the dominating chain can 
be used to make a move in the underlying chain with f$ as the stationary density. In 
order to run this chain, first run the dominating process chain with K = (3\. Then take 
each move ordered by time and apply the following procedure. Here the input variable 
move contains the type of move and the location of the point, and M is the uniform 
mark attached to a birth event. (Lines 8 and 10 ensure that after testing M against 
P21 the value of M is changed so once again M is uniform over [0, 1].) 
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Strauss birth death swap chain 

Input: move, M, X, Output: X 

1) If move = death of point w 

2) Let X <- X - w 

3) Else (there was birth at point v) 

4) Let A = {w G X : u) < i?} 

5) For each w£ Ado 

6) If M < fa 

7) Let A <- A \ w 

8) Let M <- M//3 2 

9) Else 

10) Let M <r- (1 -M)/(l -fa) 

11) If#A<l 

12) Let X ^ X + v - A 



Pairwise interaction point processes The Strauss process can be generalized to the 
pairwise interaction point process [18]. The new density is: 

g{*) = Pf x II ^v'}), 

for <p an arbitrary nonnegative function. For a Strauss process (j)({v, v'}) = ^ p ^ v,v ^~ ■ 
When 0(-) < 1, the same pseudocode as for the Strauss process can be used with 
a few changes. In line 4 set A = {w G X : cj)({v,w}) > 0}, and in lines 6, 8, and 10 
change fa to (p(v,w). 



4 Perfect simulation by dominated CFTP 

The approach used here for perfect simulation is dominated CFTP [20J. The basic 
idea is to consider a stationary process {X(i)}^ = _ 00 , and try to determine -X"(0), the 
configuration of the process at time 0. First note that the dominating process D is 
time reversible, and so can also be considered over t £ (— oo,0]. Fix T > 0. At 
this stage, D(—T) is known, but X(—T) is not. Because the D and X processes are 
coupled together, knowing the births and deaths used to update D for t G [— T, 0] 
can sometimes be used to evaluate X(0) without knowing X(—T). In this case, the 
known value of X(0) is returned by the algorithm. Otherwise, the algorithm considers 
a larger interval, such as [— 2T, 0]. First events in [— 2T, —T] are generated (the events 
in [— T, 0] are reused) and again the algorithm tries to find AT(0). The process repeats 
until X(0) is found. 

Now for the details. Suppose X is an underlying process with marked dominating 
process (D, M) as in Section [3j Fix a number of events N. Then moving backward in 
time from time 0, consider the first N births and deaths to occur in the dominating 
process. Draw -D(O) as a spatial Poisson point process with intensity K\. Then as the 
process moves backwards in time, birth and deaths in D are generated starting from 
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Figure 3: Sample run of Dominating process 



this initial configuration. Figure [3] illustrate what a set of moves A looks like after 
running the process backwards in time. 

Now the underlying process can only have births at the same time as births in the 
dominating process, and therefore the state of the chain at time —t is a collection of 
points X{—t) C D(—t) (which is why D is referred to as the dominating process.) As 
pointed out in [20], it is always possible to create upper and lower configurations U(t) 
and Lit) such that 

L(t) C X(t) C U(t) C D(t) for all t G (-oo, 0]. (14) 

In [20] this was called the sandwiching property. The process (L(t),U(t)) can also be 
thought of clS EL bounding process (see [II])- 

In particular, suppose that D{t) is known backwards in time through the first N 
events. Call the time where event number N occurred (moving backward in time) Tjy. 
Then set Ltv(tat) to the empty configuration, and Un(t~n) to D(tn). Every time there 
is an event at time t (either a birth or death in the dominating process) make sure that 
Um and L^ continue to bound X once the event updates the chain. Then a simple 
induction argument shows that Ljv(0) C A^v(0) C Un(0). Note if Ln(0) = Un{0), 
then X/v(0) is trapped between them and also equals this common value. This is the 
"coupling" part of CFTP. 

The "from the past" part of CFTP works as follows. Suppose that Ln(0) ^ Un{0). 
Then increase the value of N and try again. Let N' > N. The first N events for the 
dominating process have already been generated, these same events must be used in 
subsequent evaluations of the bounding process. Therefore, only N' — N additional 
events need to be generated. Once these events have been generated, run Ljv' and Un* 
forward until Ljv'(0) and Un'(0) can be compared. 

The updates must have the funneling property (see [H]): for all t' > t and N' > N 

L N (t) C L N ,(t) C U N ,{t) C U N {t) L N (t') C L N ,{t') C U N >{t') C U N (t'). (15) 

In our approach, U^it) C D(t) for all t and N. Moreover, Ln(tn) = and Un(tn) = 
D(tn). Therefore, for any N' > N, -Ltv(tat) C Ln>{tn) ^ Un'(t~n) C Un(tn)- 
Therefore, if L N (0) = U N (0) for some N, then L N ,(0) = U N >(0) for all N' > N 
as well, so it is not necessary to try every value of N. Propp and Wilson [23] noted 
the advantages of doubling N at each step, and in the pseudocode below that is how 
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the algorithm proceeds. The choice of N initia i is up to the user, but note that Lat(O) 
cannot equal f7j\r(0) unless every point in D(tn) has died by time 0. Therefore, a simple 
reasonable choice for Ninitiai is the expected number of points in the dominating process 
at a fixed time, \(S). 

Dominated coupling from the past 

Input: N initia i Output: X(0) 

1) Draw P <- Poisson(ETA(5')), let N' <- N inU ial, let N «- 

2) Draw P independent draws from A(-)/A(S) 

3) Repeat 

4) Let x,A<- Backward Dominating(x, N' — N) 

5) Let rows N + 1 through N' of table B be table A 

6) Generate (Ljv'i ^JV') starting from (0,x) using P 

7) Let N <- N', let AT' <- 2iV 

8) Until Lat(0) = U N (0) 

9) Let X(0) <- Lat(O) 

The Backward Dominating subroutine generates events backwards in time start- 
ing from a given state of the dominating chain. Note that dominated CFTP does not 
need to know the times of the events, and so unlike the Dominating Process code, 
the Backward Dominating procedure does not record times of events. 

Backward Dominating 

Input: x, N Output: x, A 

1) For i from 1 to iV do 

2) Draw t d <- Exp(KX(S)), draw t b «- Exp(#x), draw M i- Unif([0, 1]) 

3) If td < tj, (then there is a death) 

4) Draw v 4- A(-)/A(5), let x 4- x + v, add row ["death" v] to A 

6) Else (there is a death) 

7) Draw w <— Unif(x), let x <— x — w, add row ["birth" w M] to A 
Kendall and M0ller showed (Theorem 2.1 of [20J) that as long as the probability 

that D(t) visits the empty configuration in [0, t] goes to 1 as t goes to infinity, this 
procedure will terminate in finite time with probability 1. The resulting configuration 
Ln(0) = Ptv(0) is a draw exactly from the target distribution. 

Now consider the question: how should L and U be updated given a particular 
event in the dominating process so the tunneling property (15) holds? 



4.1 Updating the bounding process 

Suppose at time t that point w dies in the dominating process D(t). Then as points 
die in the dominating process, they die in the underlying process X(t) as well, and so 
having them die in C/jv(i) and £jv(i) means they still bound X(t). 

Births and swap moves affect the bounding process in the following fashion. Suppose 
that v is born, and marked with value M. Say that Y G [Lp{(t),Ujsr(t)] if Ljv(t) C 
Y C £/j\r(i). Then if there exists a Y such that Y G [Ljy(t), U]y(t)] and v is added or 
swapped into Y given M, then v is also added to Ujsf(t). Alternatively, if for all Y such 
that Y G [L^(t), Uff(t)], the point v is added or swapped into Y given M, then v is 
added to L^{t). 
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This setup ensures that if v is added to Ljv(i), it is also added to U^(t). For a 
swap, there are two possibilities. 

1. If for all Y such that Y E [Ljv(t), £7jv(t)] point w is swapped out of the process 
given M(t), then it is removed from C//v(i). 

2. If there exists a Y such that Y G [Ljv(t), ^iv(i)] and point to is swapped out of 
the process given M(t), then w is removed from L^{t). 

This is the best update possible for L/v(t) and U^it) that maintains both the sandwich 
and funneling properties. 

This method also satisfies equation (15). Consider running Ln,Un and Ljy>, Upp 
for some N' > N. After N' — N events running (Lw, Upp) forward in time to time tn, 
they will start using the same birth and death updates as (Lpj, Up/). Since Lp/^Tpj) = 
and Un(tn) = D{t~n), it is true that Lp/(rpj) C Lpp(Tpj) Q Un>{tn) Q Upj(tn)- 

In a birth event at time t, a point v is added to Up/'(t) if there exists Y E 
[Lffi(t),Upfr(t)] that would add the point v. The same Y also lies in [Lj^(t),Uff(t)], 
and so will be added to C7jv(*) as well. This preserves Ujp C L^v- Similarly, points 
added to L^v' will also be added to Ljy, and dead points are removed from all processes, 
therefore (15) holds for one event. A simple induction then verifies that it will hold for 
any finite number of events. 



5 Simulation of examples 

In this section the general methodology of the previous section is applied to the Strauss 
process from Section[3]on 5 = [0, l] 2 with A equal to Lebesgue measure. Notice that as 
R goes to zero, the number of points that fit inside S without overlap goes to infinity, 
so this setup can be used to understand the Strauss process as the window grows to 
R 2 . 

Strauss process When a point dies in the dominating process, it dies in any X £ 

[Lpf, Un] so it can be safely removed from both and maintain the sandwich property. 

The interesting cases happen when a point is born. The following pseudocode reflects 

this structure. 

Strauss bounding swap chain 

Input: move, M, L, U, p SW ap Output: L, U 

1) If move = death of point w 

2) Let L <- L - w, let U <- U - w 

3) Else (there was birth at point v) 

4) Let A L = {w e L : p(w, v) < R}, let A v = {w G U : p(w, v) < R} 

5) For each w G Ajj do 

6) If M < p 2 

7) Let Au<-Au\ {w}, let A L ^ A L \ {w}, 

8) Let M M//3 2 

9) Else 

10) Let M<- (l-M)/(l- ( 5 2 ) 

11) Let [L,U] <- Birth update(u, A L , Au,p swap ) 



11 



To compare the swap chain with the no swap chain, it will be convenient to allow 
the Birth update procedure to not always execute the swap. When it is possible to 
execute the swap, a Bernoulli with parameter p SW ap will be rolled. Only if this Bernoulli 
is 1 will the swap be executed. Therefore this new chain is a mixture of the no swap 
chain and the swap chain. When p swa p = 0, it is the original chain with no swap, when 
Pswap = 1 it is the new chain that always swaps when possible. 
Birth Update 

Input: v,A L> A U: p SW ap Output: L, U 

1) Draw S <- Bem(p swap ) 

2) If S = 0, #A L = 0, #A V = 0, then L^LU {v}, U <- U U {v} 

3) If S = 0, #A L = 0, #A V > 1 then U <- U U {v} 

4) If S = 1, #A L = 0, #A V = 0, then L^LU {v}, U <- U U {v} 

5) If 5 = 1, #A L = 0, #A V = 1, then L<-LU {v}, U <- U U {v} \ A v 

6) If S = 1, #A L = 0, #A V > 1, then U <- U U {v} 

7) If S = 1, #A L = 1, #^c/ = 1, then L ^- L U {?;} \ U <- U U {u} \ A v 

8) If 5 = 1, = 1, #,4^ > 1, then L <- L \ A L , U <- U U {v} 

Theorem 5.1. The above update satisfies the sandwich property. 

Proof. For deaths, as noted earlier the sandwich property follows immediately. For 
births, several cases must be considered. First let S = (the no swap case). 

Line 2: #j4l = 0, #Ajj = 0. Then for all Y E [L, U], v has no neighbors within 
distance R, and so it is added to Y. Therefore it should be added to both L and U. 

Line 3: #Al = 0, #Ajj > 1. Then for Y = L, the point v will be added, but for 



Y = U, the point v is not. Therefore using the guidelines of Section |47ij v is added to 
U but not to L. 

Other case: #Al > 0, Then for all Y G [L,U], the point v will not be born, so 
both L and U remain unchanged, so no line of the code represents this case. 
Now suppose S = 1 (so a swap move occurs if possible.) 

Line 4: #^4_l = 0, #A(j = 0. In this case, for all Y G [L, U], the birth is accepted, 
and so v is added to both L and U. 

Line 5: #A L = 0, #A V = 1. Let Y G [L, U], and say {w} = Ajj. Ifw^Y then v 
is just born. However if w E Y, then w is removed from Y and v is added. Hence w 
should be removed from U and v added to L and U. 

Line 6: #^4l = 0, #Ajj > 1. For Y = L, v will be born, so must be added to U. 
However, for Y = U, v is not added, so L remains unchanged. 

Line 7: #A L = 1, #A V = 1. This means A L = A U: so for all Y G [L, U], the single 
point in Al is removed and v is added. Hence this is done for both L and U. 

Line 8: #Al = 1, j^Ajj > 1. For Y = L, the point v is added and the point in Al 
is removed. So v must be added to U and ^4^ removed from L. For Y = U, the point 
v is neither added nor is Al removed, so v is not added to L and Al is not removed 
from U. 

Other case: #Al > 1. Then for all Y G [L, U] no swap or birth occurs, so both L 
and U remain unchanged, and no line is needed. 

□ 
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Figure [4] shows the advantage gained by using the swap move. The times are 
measured in number of events generated by the algorithm in order to be comparable 
with the times needed for a regular Markov chain approach. On the left are the raw 
number of times for the chain with no swap (p SW ap = 0) and with the swap (p SW ap = 1)- 
The plot on the right shows the ratio of these two times. Note that as j3i gets larger, 
the speedup gained by using the swap move also increases. 



Running time for swap and no swap chains 




50 100 150 200 250 



Figure 4: Running time of dCFTP for Strauss model on S = [0, l] 2 , (3 2 = .5, R = .1. 



6 Running time 

Now consider how many events must be generated before the dominated coupling from 
the past procedure terminates. Deaths in Upf(t) — Ljy(t) cause the bounding process 
to move together, while births can add a point to C7jv(*) but not to Lj^{t). Therefore it 
is reasonable that the perfect simulation algorithm will run faster in situations where 
the birth rate is low. If N events were generated backwards in time, let Un and 
Ljv denote the corresponding bounding process. Recall that if Un(0) = Ljv(0), then 
U N >(0) = L N ,(0) for all N' > N. 

To develop the best theoretical bound on the mixing time of the Markov chain, it 
turns out (as in [7]) to be useful not to always execute the swap move. Suppose that 
when the possibility of a swap exists, the swap move is only executed with probability 
1/4. This can be viewed as a mixture of the original and swap move chains. 

Theorem 6.1. Suppose that N events are generated backwards in time and then run 
forward to get Un(0) and Ljv(0). Let B(v,R) denote the area within distance R of 
v G S, let r = sup vGS X(B(v, R)). 

If (3\(1 — /?2) r < I; then for the chain without the swap move 

P(U N (0) ^ Ljv(0)) < 2exp(-.09iV) + fii\(S) exp(-iV(l - - p 2 )r)/(4(3 1 X(S)). 

(16) 
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If — fi2)r < 2, then for the chain where a swap is executed with probability l/4 ; 



P(U N (0) ^ L N (0)) < 2exp(-.09iV) + /3iA(5) exp(-JV(l - .5^(1 



f3 2 )r)/m\(S)). 

(17) 



Without a swap move, it is possible to prove that the dominated CFTP algorithm 
runs in polynomial time when — @2)r < 1. Use of a swap move with 1/4 probability 
yields a better analysis: for fixed (5 2 and r, the algorithm is guaranteed to run in 
polynomial time over twice the range of /3\ as before, for /3i(l — f$2)r < 2. When 
dominated CFTP runs in polynomial time, the underlying Markov chain is also rapidly 
mixing. 

Why the value of 1 /4 for the probability? This is an artifact of the proof technique. 
The theorem only gives sufficient, not necessary conditions for the algorithm to be fast, 
and simulation experiments indicate that the algorithm actually takes the fewest steps 
when the swap moves are used as often as possible (reasons why this is true are noted 
in the proof of the theorem.) 



Theorem 6.1 has immediate consequences for the expected running time of domi- 
nated CFTP. Recall that in dCFTP the number of events was doubled each time. Say 
P(C/jv(0) 7^ Ljv(O)) < aexp(— bN), and let T be the number of events generated in a 
call of dCFTP. Then for T > t, dCFTP must have failed on a run of length at least 
t/2. So 



E[T] = jr P(T > N) < 



N=l 



\(2/b)\na\ 
N=l 



+ aexp(-WV72), (18) 

N=\(2/b) In a] 



which makes E[T] = 0(lna/6), and the mean running time 0((3i\(S)(ln f3i\(S))) for 
the no swap chain when /3i(l— Z^)?* < 1 and in the 1/4-swap chain when /3i(l— /^r < 2. 



Proof of Theorem 6.1 As in the description of dCFTP in Section |4| the algorithm 
begins by generating a Poisson spatial point process with parameter j3\\{S) at time 0, 
generate birth-death events backwards in time. Then Un becomes this state, 
is the empty configuration, and the bounding processes are run forward in time. Let 
Q(t) = Ujsr(t) \ Ln[£). Then the chains have come together if and only if #Q(0) = 0. 



Strauss no swap move Here Pswap — 0. All individual death rates are 1, so the rate 
of deaths of points in Q(t) is just #Q(t). Each death reduces Q(t) in size by 1. Call 
this a good event. So the rate of good events is jf=Q(t). 

For #Q(t) to increase by 1 (call this a bad event), a birth must occur at v and be 
added to t/jy(t) but not L^{t). Let w be any point in Q(t). Then for Q{t) to give 
rise to another point in Q(t), a point v must be born within distance R of w and the 
Bern(/32) roll must be 0. The area surrounding w is at most r, and the Bernoulli roll 
acts as a thinning procedure in a Poisson process. Therefore the rate at which w is 
creating new points in Q{t) is at most j3\{l — fizjr, and the overall rate of bad events 
is at most /3i(l - (3 2 )r#Q(t). 

Suppose the rate of bad events is smaller than the rate of good events. The prob- 
ability that one event occurs in the time interval from t to t + h is proportional to h, 
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the probability that n events occurs is 0{h n ). Hence 

oo 

E[E[#Q(t + h)\U(t), L{t)\ - #Q(t)\ < E[(#Q(i)/3i(l - fo)r - #Q(t))h + ^ iO(h% 

i=2 

which means 

Km EW#Qlt + h)W), m -Mt)] < _ E[#W)(1 _ A(1 _ AW] . 

Let g(t) = E[#Q(i)], and let ttv be the time of the iVth event moving backwards in time. 
Then q(r N ) < E[#D(t n )} = PiX(S), so together with q'(t) < -q(t){l - /?i(l - /3 2 )r): 

q(t) < Pi\(S) exp(-i(l - A(l - /3 2 )r)). 

By Markov's inequality, P(Q(0) / 0) = P(#Q(0) > 1) < g(0). 

Now fix iV, the number of events to run back in time, and set t = N/[4/3iX(S)]. The 
probability that Q(0) does not equal starting at —t is at most exp(— A/'/[4/3iA(S')](l — 

Using Chernoff bounds [5], it can be shown that for A ~ Pois(a), P(j4 > 2a) < 
exp(— a(21n2 — 2 + 1)). So after t time, the probability that more than N/2 events were 
generated in a Poisson process with rate f3i\(S) is at most exp(— (N/A)(2 In 2 — 2 + 1)). 
Both the times of the births and times of deaths (viewed individually) are Poisson 
processes with rate f3iX(S), therefore the probability that either uses more than N/2 
events (by the union bound) is at most 2 exp(— .09iV). But if at this time each process 
used at most N/2 events, then moving back in time TV" events puts the user even farther 
back in time, and if coalescence occurs at —t, it will also occur starting at tn- Again 
using the union bound, the probability of failure is at most 

2exp(-.09A0 +exp(-iV(l - - lh)r)). 

Strauss with swap move Now consider what happens when p SW ap > 0. The rate of 
good events (deaths) remains unchanged, but the rate of bad events changes. Going 
back to Birth Update, lines 2, 4, and 7 leave #Q(t) unchanged, lines 3 and 6 increase 
#Q(t) by 1, line 5 decreases #Q(t) by 1, and line 8 increases #Q(t) by 2. Let 63 denote 
the area of the region where a birth triggers line 3, with 64,65,66; an d 6s defined 
similarly. Line 3 is only activated when a swap does not occur, and lines 4 through 8 
only occur when a swap does occur. Putting all this together, the rate of change from 
births is: 

(1 -Pswap)b3 + yWp6 5 (-l) +Pswapb6 + Pswaph (2) . 

Note that 63 = 65+66 since the conditions on lines 5 and lines 6, neglecting the condition 
on S, form a partition of the conditions in line 3. So the rate can be simplified to 

(1 - 2p swap )b 5 + 6 6 + 2p swap b 8 . 

Now, in line 6, #Au — j^A^ > 2, so any point triggering these lines must be within 
distance R of at least two points in Q(t). Points in 65 or 6s must be within distance 



15 



R of at least one point in Q(t). Since each point in Q(t) is adjacent to area at most r, 
this means 65 + 2b& + bg < #Q(t)r. 

The variable p swap can be set to any number from to 1 , but setting it to p swap = 
1/4 gives an upper bound on the bad event rate of (l/2)&5+&6 + (l/2)&8 < (l/2)#Q(i)r. 
Note that if 65 > bg, then it makes sense to make p swap = 1 to drive the bad event rate 
as small as possible. While simulations indicate that p SW a P = 1 is the best choice in 
practice, setting p swap = 1/4 gives the best analysis of this algorithm. 

Recall the bad event rate when p swap = was bounded above by #Q(t)r. With 
Pswap = 1/4, the bad event rate is bounded above by #Q(t)r/2, and this factor of two 
carries throughout the remainder of the proof to give ( |17[ ) . □ 

Haggstrom and Steif gave a result similar to the previous theorem for Unitary 
codings for high noise Markov random fields |llj . but their analysis involves moving 
backwards rather than forwards in time, and they did not employ the swap move. 

Figure [5] illustrates the mean run time for a fixed value of A as the probability of 
a swap varies from p = up to p = 1. The running time (as measured by generated 
iterations) decreases as the chance of swapping increases. This same phenomenon was 
noted for the hard-core gas model on graphs in |15j . 
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Figure 5: Expected iterations for A = 40 with varying probability of swapping 



7 Conclusions 

The regular birth death chains only move when no point blocks the birth of a point 
in the dominating process. The birth death swap chains move when at most one 
point blocks the birth of a point in the dominating process. This alone means that 
more moves are being taken, and helps to explain the improved analysis and improved 
performance when used for perfect sampling with dominated coupling from the past. 
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